Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BIGBIRD, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BIGBIRD is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having O(1) global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BIGBIRD drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.
translated by 谷歌翻译
Applying Machine learning to domains like Earth Sciences is impeded by the lack of labeled data, despite a large corpus of raw data available in such domains. For instance, training a wildfire classifier on satellite imagery requires curating a massive and diverse dataset, which is an expensive and time-consuming process that can span from weeks to months. Searching for relevant examples in over 40 petabytes of unlabelled data requires researchers to manually hunt for such images, much like finding a needle in a haystack. We present a no-code end-to-end pipeline, Curator, which dramatically minimizes the time taken to curate an exhaustive labeled dataset. Curator is able to search massive amounts of unlabelled data by combining self-supervision, scalable nearest neighbor search, and active learning to learn and differentiate image representations. The pipeline can also be readily applied to solve problems across different domains. Overall, the pipeline makes it practical for researchers to go from just one reference image to a comprehensive dataset in a diminutive span of time.
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
We demonstrate a Physics-informed Neural Network (PINN) based model for real-time health monitoring of a heat exchanger, that plays a critical role in improving energy efficiency of thermal power plants. A hypernetwork based approach is used to enable the domain-decomposed PINN learn the thermal behavior of the heat exchanger in response to dynamic boundary conditions, eliminating the need to re-train. As a result, we achieve orders of magnitude reduction in inference time in comparison to existing PINNs, while maintaining the accuracy on par with the physics-based simulations. This makes the approach very attractive for predictive maintenance of the heat exchanger in digital twin environments.
translated by 谷歌翻译
We study politeness phenomena in nine typologically diverse languages. Politeness is an important facet of communication and is sometimes argued to be cultural-specific, yet existing computational linguistic study is limited to English. We create TyDiP, a dataset containing three-way politeness annotations for 500 examples in each language, totaling 4.5K examples. We evaluate how well multilingual models can identify politeness levels -- they show a fairly robust zero-shot transfer ability, yet fall short of estimated human accuracy significantly. We further study mapping the English politeness strategy lexicon into nine languages via automatic translation and lexicon induction, analyzing whether each strategy's impact stays consistent across languages. Lastly, we empirically study the complicated relationship between formality and politeness through transfer experiments. We hope our dataset will support various research questions and applications, from evaluating multilingual models to constructing polite multilingual agents.
translated by 谷歌翻译
Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks.
translated by 谷歌翻译
共享控制可以通过协助执行用户意图来帮助进行远程处理的对象操纵。为此,需要稳健和及时的意图估计,这取决于行为观察。在这里,提出了意图估计框架,该框架使用自然目光和运动功能来预测当前的动作和目标对象。该系统在模拟环境中进行了训练和测试,并在相对混乱的场景中和双手中产生的拾音器和放置序列,另一方面可能是手动。验证是在不同的用户和手中进行的,实现了预测的准确性和优势。对单个特征的预测能力的分析表明,在当前动作的早期识别中,抓握触发器和目光的凝视特征的优势。在当前的框架中,可以将相同的概率模型用于并行和独立工作的两只手,而提出了基于规则的模型来识别所得的双人动作。最后,讨论了这种方法对更复杂,全行为操纵的局限性和观点。
translated by 谷歌翻译
我们研究了图结构识别的问题,即在时间序列之间恢复依赖图的图。我们将这些时间序列数据建模为线性随机网络动力学系统状态的组成部分。我们假设部分可观察性,其中仅观察到一个包含网络的节点子集的状态演变。我们设计了一个从观察到的时间序列计算的新功能向量,并证明这些特征是线性可分离的,即存在一个超平面,该超平面将与连接的节点成对相关的特征群体与与断开对相关的节点相关联。这使得可以训练各种分类器进行因果推理的功能。特别是,我们使用这些功能来训练卷积神经网络(CNN)。由此产生的因果推理机制优于最先进的W.R.T.样品复杂性。受过训练的CNN概括了结构上不同的网络(密集或稀疏)和噪声级别的轮廓。值得注意的是,他们在通过合成网络(随机图的实现)训练时也很好地概括了现实世界网络。最后,提出的方法始终以成对的方式重建图,也就是说,通过确定每对相应的时间序列中的每对节点中是否存在边缘或箭头或不存在箭头。这符合大规模系统的框架,在该系统中,网络中所有节点的观察或处理都令人难以置信。
translated by 谷歌翻译
模型不可知的元学习算法旨在从几个观察到的任务中推断出先验,然后可以使用这些任务来适应新任务,但很少有示例。鉴于在现有基准中产生的任务的固有多样性,最近的方法使用单独的可学习结构,例如层次结构或图形,以实现对先验的特定任务适应。尽管这些方法产生了明显更好的元学习者,但我们的目标是在异质任务分配包含具有挑战性的分布变化和语义差异时提高其性能。为此,我们介绍了CAML(对比知识增强的元学习),这是一种新颖的方法,用于知识增强的几次学习,它演变了知识图以有效地编码历史经验,并采用了对比性的蒸馏策略,以利用编码的知识来为基础学习者的任务感知调制。使用标准基准测试,我们在不同的几次学习方案中评估CAML的性能。除了标准的少量任务适应外,我们还考虑了我们的经验研究中更具挑战性的多域任务适应和少数数据集泛化设置。我们的结果表明,CAML始终胜过最知名的方法,并实现了改善的概括。
translated by 谷歌翻译
至于其他形式的AI,最近已经对不同用户同伙的性能差异进行了研究。在语音识别方面实现公平性的一种方法是(1)确定遭受低标准表现的说话者队列,以及(2)采取针对发现同类的公平性缓解措施。在本文中,我们使用产品规模的AI助手语音识别系统的数据报告了发现和缓解性能差异的初步发现。我们将基于地理和人口统计学信息的队列发现与一种更可扩展的方法进行比较,该方法将使用扬声器嵌入技术分组没有人类标签的说话者。为了缓解公平性,我们发现对代表性不足的队列的过度采样,以及通过其他输入变量对扬声器队列的建模,从而减少了表现和底部性能队列之间的差距,而不会降低整体识别精度。
translated by 谷歌翻译